multi-task model
Escaping Optimization Stagnation: Taking Steps Beyond Task Arithmetic via Difference Vectors
Wang, Jinping, Gao, Zhiqiang, Zhang, Dinggen, Xie, Zhiwu
Current methods for editing pre-trained models face significant challenges, primarily high computational costs and limited scalability. Task arithmetic has recently emerged as a promising solution, using simple arithmetic operations--addition and negation--based on task vectors which are the differences between fine-tuned and pre-trained model weights, to efficiently modify model behavior. However, the full potential of task arithmetic remains underexplored, primarily due to limited mechanisms for overcoming optimization stagnation. To address this challenge, we introduce the notion of difference vector, a generalized form of task vectors derived from the historical movements during optimization. Using difference vectors as directed perturbations, we propose the Difference V ector-based Anisotropic Scaling Iterative algorithm (DV -BASI) to enable a continuous optimization process for task arithmetic methods without relying on any additional modules or components. Notably, by leveraging escapability and directional advantages of difference vectors, the average performance on different tasks of the multi-task model merged by DV -BASI may even outperform models individually fine-tuned. Based on this observation, we extend the application of difference vectors to a feasible fine-tuning method for single-task models. On the practical side, DV -BASI allows expressive searching directions with few learnable parameters and forms a scalable framework. We also integrate DV -BASI with task arithmetic methods and advanced optimization techniques to achieve state-of-the-art performance on both supervised and unsupervised evaluation protocols.
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- (12 more...)
Semi-Supervised Multi-Task Learning for Interpretable Quality As- sessment of Fundus Images
Telesco, Lucas Gabriel, Nejamkin, Danila, Mata, Estefanía, Filizzola, Francisco, Wignall, Kevin, Troilo, Lucía Franco, Cenoz, María de los Angeles, Thompson, Melissa, Leguía, Mercedes, Larrabide, Ignacio, Orlando, José Ignacio
Retinal image quality assessment (RIQA) supports computer-aided diagnosis of eye diseases. However, most tools classify only overall image quality, without indicating acquisition defects to guide recapture. This gap is mainly due to the high cost of detailed annotations. In this paper, we aim to mitigate this limitation by introducing a hybrid semi-supervised learning approach that combines manual labels for overall quality with pseudo-labels of quality details within a multi-task framework. Our objective is to obtain more interpretable RIQA models without requiring extensive manual labeling. Pseudo-labels are generated by a Teacher model trained on a small dataset and then used to fine-tune a pre-trained model in a multi-task setting. Using a ResNet-18 backbone, we show that these weak annotations improve quality assessment over single-task baselines (F1: 0.875 vs. 0.863 on EyeQ, and 0.778 vs. 0.763 on DeepDRiD), matching or surpassing existing methods. The multi-task model achieved performance statistically comparable to the Teacher for most detail prediction tasks (p > 0.05). In a newly annotated EyeQ subset released with this paper, our model performed similarly to experts, suggesting that pseudo-label noise aligns with expert variability. Our main finding is that the proposed semi-supervised approach not only improves overall quality assessment but also provides interpretable feedback on capture conditions (illumination, clarity, contrast). This enhances interpretability at no extra manual labeling cost and offers clinically actionable outputs to guide image recapture.
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.88)
- Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
To Reviewer 1: 1 C1: The main weakness of this work is the complexity of the approach
R1: We do agree that our approach is complex and involves "multiple approximation steps". C2: On Park1, the NN "finds the global optimum after one query point"-- How is this significant... " First, the quality of the query point very much depends on the accuracy of the surrogate model. C3: Details about hyper-parameter selection; no liberty to choose a heldout dataset in practice. We optimized the hyper-parameters to minimize the average test error. We will supplement these details.
To Reviewer 1: 1 C1: The main weakness of this work is the complexity of the approach
R1: We do agree that our approach is complex and involves "multiple approximation steps". C2: On Park1, the NN "finds the global optimum after one query point"-- How is this significant... " First, the quality of the query point very much depends on the accuracy of the surrogate model. C3: Details about hyper-parameter selection; no liberty to choose a heldout dataset in practice. We optimized the hyper-parameters to minimize the average test error. We will supplement these details.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Uganda > Central Region > Kampala (0.04)
Effective Multi-Task Learning for Biomedical Named Entity Recognition
Ruano, João, Correia, Gonçalo M., Barreiros, Leonor, Mendes, Afonso
Biomedical Named Entity Recognition presents significant challenges due to the complexity of biomedical terminology and inconsistencies in annotation across datasets. This paper introduces SRU-NER (Slot-based Recurrent Unit NER), a novel approach designed to handle nested named entities while integrating multiple datasets through an effective multi-task learning strategy. SRU-NER mitigates annotation gaps by dynamically adjusting loss computation to avoid penalizing predictions of entity types absent in a given dataset. Through extensive experiments, including a cross-corpus evaluation and human assessment of the model's predictions, SRU-NER achieves competitive performance in biomedical and general-domain NER tasks, while improving cross-domain generalization.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (10 more...)
Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning
Lee, Yeoreum, Jung, Jinwook, Baik, Sungyong
A BSTRACT Large-scale deep learning models with a pretraining-finetuning paradigm have led to a surge of numerous task-specific models fine-tuned from a common pre-trained model. Recently, several research efforts have been made on merging these large models into a single multi-task model, particularly with simple arithmetic on parameters. Such merging methodology faces a central challenge: interference between model parameters fine-tuned on different tasks. Few recent works have focused on designing a new fine-tuning scheme that can lead to small parameter interference, however at the cost of the performance of each task-specific fine-tuned model and thereby limiting that of a merged model. To improve the performance of a merged model, we note that a fine-tuning scheme should aim for (1) smaller parameter interference and (2) better performance of each fine-tuned model on the corresponding task. In this work, we aim to design a new fine-tuning objective function to work towards these two goals. In the course of this process, we find such objective function to be strikingly similar to sharpness-aware minimization (SAM) objective function, which aims to achieve generalization by finding flat minima. Drawing upon our observation, we propose to fine-tune pre-trained models via sharpness-aware minimization. The experimental and theoretical results showcase the effectiveness and orthogonality of our proposed approach, improving performance upon various merging and fine-tuning methods. Recent successes of the pretraining-finetuning paradigm have given rise to a burst of task-specific open-source models in communities, such as Hugging Face. Diversity yet ready availability of large task-specific models have naturally elicited a question from researchers: Can we combine these large models into one, while retaining the performance on each task? Traditionally, a single multi-task model is obtained by jointly training on data across all tasks (Caru-ana, 1997; Crawshaw, 2020; V andenhende et al., 2022). However, given the size of foundation models and the number of tasks, joint training on all tasks incurs significant computational costs. However, a central challenge remains: parameters of different task-specific models interfere or conflict with each other, leading to the performance degradation of a merged multi-task model on each task.
Efficient Multi-Task Modeling through Automated Fusion of Trained Models
Zhou, Jingxuan, Bao, Weidong, Wang, Ji, Zhong, Zhengyi, Zhang, Dayu
Although multi-task learning is widely applied in intelligent services, traditional multi-task modeling methods often require customized designs based on specific task combinations, resulting in a cumbersome modeling process. Inspired by the rapid development and excellent performance of single-task models, this paper proposes an efficient multi-task modeling method that can automatically fuse trained single-task models with different structures and tasks to form a multi-task model. As a general framework, this method allows modelers to simply prepare trained models for the required tasks, simplifying the modeling process while fully utilizing the knowledge contained in the trained models. This eliminates the need for excessive focus on task relationships and model structure design. To achieve this goal, we consider the structural differences among various trained models and employ model decomposition techniques to hierarchically decompose them into multiple operable model components. Furthermore, we have designed an Adaptive Knowledge Fusion (AKF) module based on Transformer, which adaptively integrates intra-task and inter-task knowledge based on model components. Through the proposed method, we achieve efficient and automated construction of multi-task models, and its effectiveness is verified through extensive experiments on three datasets.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (8 more...)
- Information Technology > Knowledge Management (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.90)